[SPARK-3791][SQL] Provides Spark version and Hive version in HiveThriftServer2 #2843

liancheng · 2014-10-19T06:04:09Z

This PR overrides the GetInfo Hive Thrift API to provide correct Spark version information. Another property spark.sql.hive.version is added to reveal the underlying Hive version. These are generally useful for Spark SQL ODBC driver providers. Also took the chance to remove the SET -v hack, which was a workaround for Simba ODBC driver connectivity.

TODO

Find a general way to figure out Hive (or even any dependency) version.

This blog post suggests several methods to inspect application version. In the case of Spark, this can be tricky because the chosen method:
1. must applies to both Maven build and SBT build
  
  For Maven builds, we can retrieve the version information from the META-INF/maven directory within the assembly jar. But this doesn't work for SBT builds.
2. must not rely on the original jars of dependencies to extract specific dependency version, because Spark uses assembly jar.
  
  This implies we can't read Hive version from Hive jar files since standard Spark distribution doesn't include them.
3. should play well with SPARK_PREPEND_CLASSES to ease local testing during development.
  
  SPARK_PREPEND_CLASSES prevents classes to be loaded from the assembly jar, thus we can't locate the jar file and read its manifest.
Given these, maybe the only reliable method is to generate a source file containing version information at build time. @pwendell Do you have any suggestions from the perspective of the build process?

Update Hive version is now retrieved from the newly introduced HiveShim object.

liancheng · 2014-10-19T06:04:56Z

sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala

-        .setAppName(s"SparkSQL::${java.net.InetAddress.getLocalHost.getHostName}"))
+      val sparkConf = new SparkConf()
+        .setAppName(s"SparkSQL::${java.net.InetAddress.getLocalHost.getHostName}")
+        .set("spark.sql.hive.version", "0.12.0-protobuf-2.5")


This need to be generalized.

SparkQA · 2014-10-19T06:09:41Z

QA tests have started for PR 2843 at commit 9799b50.

This patch merges cleanly.

SparkQA · 2014-10-19T06:10:04Z

QA tests have finished for PR 2843 at commit 9799b50.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-19T06:10:05Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21881/
Test FAILed.

liancheng · 2014-10-19T06:10:12Z

...iftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala

@@ -37,35 +43,81 @@ import org.apache.spark.sql.catalyst.util.getTempFilePath

 /**
 * Tests for the HiveThriftServer2 using JDBC.
+ *
+ * NOTE: SPARK_PREPEND_CLASSES is explicitly disabled in this test suite. Assembly jar must be
+ * rebuilt after changing HiveThriftServer2 related code.


This requirement should be OK for Jenkins, since Jenkins always build the assembly jar before executing any test suites.

SparkQA · 2014-10-19T06:40:11Z

QA tests have started for PR 2843 at commit 9799b50.

This patch merges cleanly.

SparkQA · 2014-10-19T06:42:12Z

QA tests have started for PR 2843 at commit 9799b50.

This patch merges cleanly.

SparkQA · 2014-10-19T07:58:04Z

QA tests have finished for PR 2843 at commit 9799b50.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2014-10-19T08:30:07Z

QA tests have started for PR 2843 at commit da5e716.

This patch merges cleanly.

liancheng · 2014-10-19T08:31:54Z

Hm, 3 consecutive random build failures, embarrassing...

For the first one, unit tests are not started at all, seems that the build process was interrupted somehow. The second failure is bit weird, although we're already using random port to avoid port conflict, it still failed to open the listening port. Checked the TCP port range in Jenkins master node, which should be valid. But I don't have access to the Jenkins slave node that executed this build. The cause of the third failure is a known bug fixed in the master branch, just rebased to the most recent master.

SparkQA · 2014-10-19T08:42:12Z

Tests timed out for PR 2843 at commit 9799b50 after a configured wait of 120m.

SparkQA · 2014-10-19T09:59:37Z

QA tests have finished for PR 2843 at commit da5e716.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class SerializableMapWrapper[A, B](underlying: collection.Map[A, B])
- class Predict(
- case class EvaluatePython(

AmplabJenkins · 2014-10-19T09:59:40Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21887/
Test PASSed.

liancheng · 2014-10-19T13:57:20Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala

-    }
-
-    sql(s"SET ${testKey + testKey}=${testVal + testVal}")
-    assert(hiveconf.get(testKey + testKey, "") == testVal + testVal)
    assertResult(Set(testKey -> testVal, (testKey + testKey) -> (testVal + testVal))) {


These lines are removed because they were originally for testing the deprecated hql call. At that time sql and hql have different code paths. Later on those hql calls were changed to sql to avoid compile time deprecation warning, and this makes them absolutely duplicated code.

SparkQA · 2014-10-31T13:17:31Z

Test build #22610 has started for PR 2843 at commit 2e5aa55.

This patch merges cleanly.

liancheng · 2014-10-31T13:20:10Z

Updated Hive version information inspection. Waiting for #2685 and #2887 to be merged, then this should be ready to go after rebasing.

SparkQA · 2014-10-31T14:49:27Z

Test build #22610 has finished for PR 2843 at commit 2e5aa55.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-31T14:49:31Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22610/
Test PASSed.

liancheng · 2014-10-31T23:27:23Z

retest this please

liancheng · 2014-10-31T23:29:39Z

@marmbrus This should be ready to go once Jenkins says OK. Simba ODBC driver needs this change for the SQLGetInfo ODBC API.

SparkQA · 2014-10-31T23:34:51Z

Test build #22661 has started for PR 2843 at commit 2e5aa55.

This patch merges cleanly.

SparkQA · 2014-11-01T00:52:17Z

Test build #22661 has finished for PR 2843 at commit 2e5aa55.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-01T00:52:20Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22661/
Test FAILed.

liancheng · 2014-11-01T08:31:24Z

Fixed failed tests and rebased to the most recent master (with full Hive 0.13.1 support).

SparkQA · 2014-11-01T08:34:59Z

Test build #22691 has started for PR 2843 at commit aebb848.

This patch merges cleanly.

marmbrus · 2014-11-01T22:05:56Z

Can you please rebase?

liancheng · 2014-11-02T02:22:23Z

Done rebasing.

SparkQA · 2014-11-02T02:29:52Z

Test build #22728 has started for PR 2843 at commit a873d0f.

This patch merges cleanly.

SparkQA · 2014-11-02T03:52:24Z

Test build #22728 has finished for PR 2843 at commit a873d0f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-02T03:52:28Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22728/
Test FAILed.

liancheng · 2014-11-02T06:26:45Z

retest this please

SparkQA · 2014-11-02T06:29:49Z

Test build #22748 has started for PR 2843 at commit a873d0f.

This patch merges cleanly.

SparkQA · 2014-11-02T09:01:52Z

Test build #22748 has finished for PR 2843 at commit a873d0f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-02T09:01:56Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22748/
Test FAILed.

liancheng · 2014-11-02T10:01:52Z

retest this please

SparkQA · 2014-11-02T10:04:50Z

Test build #22759 has started for PR 2843 at commit a873d0f.

This patch merges cleanly.

liancheng · 2014-11-02T10:07:32Z

The previous test failure are caused by the flaky CliSuite. A fix has been proposed in #3060.

SparkQA · 2014-11-02T11:45:43Z

Test build #22759 has finished for PR 2843 at commit a873d0f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-02T11:45:47Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22759/
Test PASSed.

marmbrus · 2014-11-02T23:19:02Z

Thanks! Merged to master.

This PR backports apache#2843 to branch-1.1. The key difference is that this one doesn't support Hive 0.13.1 and thus always returns `0.12.0` when `spark.sql.hive.version` is queried. 6 other commits on which apache#2843 depends were also backported, they are: - apache#2887 for `SessionState` lifecycle control - apache#2675, apache#2823 & apache#3060 for major test suite refactoring and bug fixes - apache#2164, for Parquet test suites updates - apache#2493, for reading `spark.sql.*` configurations Author: Cheng Lian <[email protected]> Author: Cheng Lian <[email protected]> Author: Michael Armbrust <[email protected]> Closes apache#3113 from liancheng/get-info-for-1.1 and squashes the following commits: d354161 [Cheng Lian] Provides Spark and Hive version in HiveThriftServer2 for branch-1.1 0c2a244 [Michael Armbrust] [SPARK-3646][SQL] Copy SQL configuration from SparkConf when a SQLContext is created. 3202a36 [Michael Armbrust] [SQL] Decrease partitions when testing 7f395b7 [Cheng Lian] [SQL] Fixes race condition in CliSuite 0dd28ec [Cheng Lian] [SQL] Fixes the race condition that may cause test failure 5928b39 [Cheng Lian] [SPARK-3809][SQL] Fixes test suites in hive-thriftserver faeca62 [Cheng Lian] [SPARK-4037][SQL] Removes the SessionState instance created in HiveThriftServer2

…perty ## What changes were proposed in this pull request? At the beginning #2843 added `spark.sql.hive.version` to reveal underlying hive version for jdbc connections. For some time afterwards, it was used as a version identifier for the execution hive client. Actually there is no hive client for executions in spark now and there are no usages of HIVE_EXECUTION_VERSION found in whole spark project. HIVE_EXECUTION_VERSION is set by `spark.sql.hive.version`, which is still set internally in some places or by users, this may confuse developers and users with HIVE_METASTORE_VERSION(spark.sql.hive.metastore.version). It might better to be removed. ## How was this patch tested? modify some existing ut cc cloud-fan gatorsmile Author: Kent Yao <[email protected]> Closes #19712 from yaooqinn/SPARK-22487.

liancheng reviewed Oct 19, 2014
View reviewed changes

liancheng force-pushed the get-info branch from 9799b50 to da5e716 Compare October 19, 2014 08:24

liancheng reviewed Oct 19, 2014
View reviewed changes

liancheng force-pushed the get-info branch from da5e716 to 2e5aa55 Compare October 31, 2014 13:16

liancheng changed the title ~~[SPARK-3791][SQL][WIP] Provides Spark version and Hive version in HiveThriftServer2~~ [SPARK-3791][SQL] Provides Spark version and Hive version in HiveThriftServer2 Oct 31, 2014

liancheng force-pushed the get-info branch from 2e5aa55 to aebb848 Compare November 1, 2014 08:30

liancheng and others added 4 commits November 2, 2014 10:14

Overrides Hive GetInfo Thrift API and adds Hive version property

f857fce

Removes the Simba ODBC "SET -v" hack

1d282b8

Retrieves underlying Hive verson via HiveShim

53f43cd

Updates test case

a873d0f

liancheng force-pushed the get-info branch from aebb848 to a873d0f Compare November 2, 2014 02:22

asfgit closed this in c9f8400 Nov 2, 2014

liancheng deleted the get-info branch November 3, 2014 01:49

liancheng restored the get-info branch November 5, 2014 08:12

liancheng mentioned this pull request Nov 5, 2014

[SPARK-3971][SQL] Backport #2843 to branch-1.1 #3113

Closed

liancheng deleted the get-info branch November 21, 2014 04:22

yaooqinn mentioned this pull request Nov 10, 2017

[SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION_VERSION property #19712

Closed

[SPARK-3791][SQL] Provides Spark version and Hive version in HiveThriftServer2 #2843

[SPARK-3791][SQL] Provides Spark version and Hive version in HiveThriftServer2 #2843

Conversation

liancheng commented Oct 19, 2014

liancheng Oct 19, 2014

Choose a reason for hiding this comment

SparkQA commented Oct 19, 2014

SparkQA commented Oct 19, 2014

AmplabJenkins commented Oct 19, 2014

liancheng Oct 19, 2014

Choose a reason for hiding this comment

SparkQA commented Oct 19, 2014

SparkQA commented Oct 19, 2014

SparkQA commented Oct 19, 2014

SparkQA commented Oct 19, 2014

liancheng commented Oct 19, 2014

SparkQA commented Oct 19, 2014

SparkQA commented Oct 19, 2014

AmplabJenkins commented Oct 19, 2014

liancheng Oct 19, 2014

Choose a reason for hiding this comment

SparkQA commented Oct 31, 2014

liancheng commented Oct 31, 2014

SparkQA commented Oct 31, 2014

AmplabJenkins commented Oct 31, 2014

liancheng commented Oct 31, 2014

liancheng commented Oct 31, 2014

SparkQA commented Oct 31, 2014

SparkQA commented Nov 1, 2014

AmplabJenkins commented Nov 1, 2014

liancheng commented Nov 1, 2014

SparkQA commented Nov 1, 2014

marmbrus commented Nov 1, 2014

liancheng commented Nov 2, 2014

SparkQA commented Nov 2, 2014

SparkQA commented Nov 2, 2014

AmplabJenkins commented Nov 2, 2014

liancheng commented Nov 2, 2014

SparkQA commented Nov 2, 2014

SparkQA commented Nov 2, 2014

AmplabJenkins commented Nov 2, 2014

liancheng commented Nov 2, 2014

SparkQA commented Nov 2, 2014

liancheng commented Nov 2, 2014

SparkQA commented Nov 2, 2014

AmplabJenkins commented Nov 2, 2014

marmbrus commented Nov 2, 2014